5 research outputs found

    Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Natural Language Processing (NLP) systems can be used for specific Information Extraction (IE) tasks such as extracting phenotypic data from the electronic medical record (EMR). These data are useful for translational research and are often found only in free text clinical notes. A key required step for IE is the manual annotation of clinical corpora and the creation of a reference standard for (1) training and validation tasks and (2) to focus and clarify NLP system requirements. These tasks are time consuming, expensive, and require considerable effort on the part of human reviewers.</p> <p>Methods</p> <p>Using a set of clinical documents from the VA EMR for a particular use case of interest we identify specific challenges and present several opportunities for annotation tasks. We demonstrate specific methods using an open source annotation tool, a customized annotation schema, and a corpus of clinical documents for patients known to have a diagnosis of Inflammatory Bowel Disease (IBD). We report clinician annotator agreement at the document, concept, and concept attribute level. We estimate concept yield in terms of annotated concepts within specific note sections and document types.</p> <p>Results</p> <p>Annotator agreement at the document level for documents that contained concepts of interest for IBD using estimated Kappa statistic (95% CI) was very high at 0.87 (0.82, 0.93). At the concept level, F-measure ranged from 0.61 to 0.83. However, agreement varied greatly at the specific concept attribute level. For this particular use case (IBD), clinical documents producing the highest concept yield per document included GI clinic notes and primary care notes. Within the various types of notes, the highest concept yield was in sections representing patient assessment and history of presenting illness. Ancillary service documents and family history and plan note sections produced the lowest concept yield.</p> <p>Conclusion</p> <p>Challenges include defining and building appropriate annotation schemas, adequately training clinician annotators, and determining the appropriate level of information to be annotated. Opportunities include narrowing the focus of information extraction to use case specific note types and sections, especially in cases where NLP systems will be used to extract information from large repositories of electronic clinical note documents.</p

    Emergency department documentation templates: variability in template selection and association with physical examination and test ordering in dizziness presentations

    Get PDF
    Abstract Background Clinical documentation systems, such as templates, have been associated with process utilization. The T-System emergency department (ED) templates are widely used but lacking are analyses of the templates association with processes. This system is also unique because of the many different template options available, and thus the selection of the template may also be important. We aimed to describe the selection of templates in ED dizziness presentations and to investigate the association between items on templates and process utilization. Methods Dizziness visits were captured from a population-based study of EDs that use documentation templates. Two relevant process outcomes were assessed: head computerized tomography (CT) scan and nystagmus examination. Multivariable logistic regression was used to estimate the probability of each outcome for patients who did or did not receive a relevant-item template. Propensity scores were also used to adjust for selection effects. Results The final cohort was 1,485 visits. Thirty-one different templates were used. Use of a template with a head CT item was associated with an increase in the adjusted probability of head CT utilization from 12.2% (95% CI, 8.9%-16.6%) to 29.3% (95% CI, 26.0%-32.9%). The adjusted probability of documentation of a nystagmus assessment increased from 12.0% (95%CI, 8.8%-16.2%) when a nystagmus-item template was not used to 95.0% (95% CI, 92.8%-96.6%) when a nystagmus-item template was used. The associations remained significant after propensity score adjustments. Conclusions Providers use many different templates in dizziness presentations. Important differences exist in the various templates and the template that is used likely impacts process utilization, even though selection may be arbitrary. The optimal design and selection of templates may offer a feasible and effective opportunity to improve care delivery.http://deepblue.lib.umich.edu/bitstream/2027.42/112490/1/12913_2010_Article_1586.pd

    Automatic de-identification of textual documents in the electronic health record: a review of recent research

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here.</p> <p>Methods</p> <p>This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers.</p> <p>Results</p> <p>The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries.</p> <p>Conclusions</p> <p>In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.</p

    A Cognitive-Social Description of Exceptional Children

    No full text
    corecore